面部影响分析仍然是一项艰巨的任务,其设置从实验室控制到野外情况。在本文中,我们提出了新的框架,以应对第四次情感行为分析(ABAW)竞争的两个挑战:i)多任务学习(MTL)挑战和II)从合成数据(LSD)中学习挑战。对于MTL挑战,我们采用SMM-EmotionNet具有更好的特征向量策略。对于LSD挑战,我们建议采用各自的方法来应对单个标签,不平衡分布,微调限制和模型体系结构的选择。竞争的官方验证集的实验结果表明,我们提出的方法的表现优于基线。该代码可在https://github.com/sylyoung/abaw4-hust-ant上找到。
translated by 谷歌翻译
少量样本压缩旨在将大冗余模型压缩成一个小型紧凑型,只有少量样品。如果我们的微调模型直接具有这些限制的样本,模型将容易受到过度装备,并且几乎没有学习。因此,先前的方法优化压缩模型逐层,并尝试使每个层具有与教师模型中的相应层相同的输出,这是麻烦的。在本文中,我们提出了一个名为mimicking的新框架,然后替换(mir),以实现几个样本压缩,这首先促使修剪模型输出与教师在倒数第二层中的相同功能,然后在倒数第二个之前替换教师的图层调整良好的紧凑型。与以前的层面重建方法不同,我们的MIR完全优化整个网络,这不仅简单而有效,而且还无人驾驶和一般。MIR优于以前的余量。代码即将推出。
translated by 谷歌翻译
许多现代的机器学习算法由简单的私人算法组成;因此,一个越来越重要的问题是有效计算组成下的整体隐私损失。在这项研究中,我们介绍了Edgeworth会计师,这是一种分析方法,用于构成私人算法的差异隐私保证。 Edgeworth会计师首先使用$ f $ - 不同的隐私框架来无误地跟踪构图下的隐私损失,该框架使我们能够使用隐私损失log-logikelihoodhiehood(pllrs)表达隐私保证。顾名思义,该会计师接下来使用Edgeworth扩展到上下界限PLLR的总和的概率分布。此外,通过依靠一种使用简单的技术近似复杂分布的技术,我们证明了Edgeworth会计师可以应用于任何噪声加成机制的组成。由于Edgeworth扩展的某些吸引人的功能,该会计师提供的$(\ epsilon,\ delta)$ - 差异隐私范围是非反应的,基本上没有额外的计算成本,而不是先前的方法运行时间随成分的数量而增加。最后,我们证明了我们的上和下部$(\ epsilon,\ delta)$ - 差异隐私范围在联合分析和培训私人深度学习模型的某些制度中紧密。
translated by 谷歌翻译
立体声匹配是计算机愿景中的一个重要任务,这些任务是几十年来引起了巨大的研究。虽然在差距准确度,密度和数据大小方面,公共立体声数据集难以满足模型的要求。在本文中,我们的目标是解决数据集和模型之间的问题,并提出了一个具有高精度差异地面真理的大规模立体声数据集,名为Plantstereo。我们使用了半自动方式来构造数据集:在相机校准和图像配准后,可以从深度图像获得高精度视差图像。总共有812个图像对覆盖着多种植物套装:菠菜,番茄,胡椒和南瓜。我们首先在四种不同立体声匹配方法中评估了我们的Plandstereo数据集。不同模型和植物的广泛实验表明,与整数精度的基础事实相比,Plantstereo提供的高精度差异图像可以显着提高深度学习模型的培训效果。本文提供了一种可行和可靠的方法来实现植物表面密集的重建。 PlantSereo数据集和相对代码可用于:https://www.github.com/wangqingyu985/plantstereo
translated by 谷歌翻译
我们研究了$ N $节点上稳健地估计参数$ P $'ENY ACLY图的问题,其中$ \ gamma $小点的节点可能是对抗的。在展示规范估计器的缺陷之后,我们设计了一种计算上有效的频谱算法,估计$ P $高精度$ \ tilde o(\ sqrt {p(1-p)} / n + \ gamma \ sqrt {p(1-p)} / \ sqrt {n} + \ gamma / n)$ for $ \ gamma <1/60 $。此外,我们为所有$ \ Gamma <1/2 $,信息定理限制提供了一种效率低下的算法。最后,我们证明了几乎匹配的统计下限,表明我们的算法的错误是最佳的对数因子。
translated by 谷歌翻译
我们在差分隐私(DP)的约束下,用重型数据研究随机凸优化。大多数关于此问题的事先工作仅限于损耗功能是Lipschitz的情况。相反,正如王,肖,德拉达斯和徐\ Cite {wangxdx20}所引入的那样,假设渐变的分布已涉及$ k $ --th时刻,我们研究了一般凸损失功能。我们在集中DP下提供了改善的上限,用于凸起的凸起和强凸损失功能。一路上,我们在纯粹和集中的DP下获得了私人平均估计的私有平均估计的新算法。最后,我们证明了私有随机凸性优化的近乎匹配的下限,具有强凸损失和平均估计,显示纯净和浓缩的DP之间的新分离。
translated by 谷歌翻译
在本文中,我们研究了非交互性局部差异隐私(NLDP)模型中估计平滑普遍线性模型(GLM)的问题。与其经典设置不同,我们的模型允许服务器访问一些其他公共但未标记的数据。在本文的第一部分中,我们专注于GLM。具体而言,我们首先考虑每个数据记录均为I.I.D.的情况。从零均值的多元高斯分布中取样。由Stein的引理动机,我们提出了GLMS的$(Epsilon,\ delta)$ -NLDP算法。此外,算法的公共数据和私人数据的示例复杂性以实现$ \ alpha $的$ \ ell_2 $ -norm估计错误(具有高概率)为$ {o}(p \ alpha^{ - 2})$和$ \ tilde {o}(p^3 \ alpha^{ - 2} \ epsilon^{ - 2})$,其中$ p $是特征向量的维度。这是对$ \ alpha^{ - 1} $中先前已知的指数或准过程的重大改进,或者在$ p $中的指数smack sample sample smack glms的复杂性,没有公共数据。然后,我们考虑一个更通用的设置,每个数据记录为I.I.D.从某些次高斯分布中取样,有限制的$ \ ell_1 $ -norm。基于Stein的引理的变体,我们提出了一个$(\ epsilon,\ delta)$ - NLDP算法,用于GLMS的公共和私人数据的样本复杂性,以实现$ \ ell_ \ elfty $ - infty $ -NOMM估计的$ \ alpha误差$是$ is $ {o}(p^2 \ alpha^{ - 2})$和$ \ tilde {o}(p^2 \ alpha^{ - 2} \ epsilon^{ - 2})$,温和的假设,如果$ \ alpha $不太小({\ em i.e.,} $ \ alpha \ geq \ omega(\ frac {1} {\ sqrt {p}}})$)。在本文的第二部分中,我们将我们的想法扩展到估计非线性回归的问题,并显示出与多元高斯和次高斯案例的GLMS相似的结果。最后,我们通过对合成和现实世界数据集的实验来证明算法的有效性。
translated by 谷歌翻译
A recent study has shown a phenomenon called neural collapse in that the within-class means of features and the classifier weight vectors converge to the vertices of a simplex equiangular tight frame at the terminal phase of training for classification. In this paper, we explore the corresponding structures of the last-layer feature centers and classifiers in semantic segmentation. Based on our empirical and theoretical analysis, we point out that semantic segmentation naturally brings contextual correlation and imbalanced distribution among classes, which breaks the equiangular and maximally separated structure of neural collapse for both feature centers and classifiers. However, such a symmetric structure is beneficial to discrimination for the minor classes. To preserve these advantages, we introduce a regularizer on feature centers to encourage the network to learn features closer to the appealing structure in imbalanced semantic segmentation. Experimental results show that our method can bring significant improvements on both 2D and 3D semantic segmentation benchmarks. Moreover, our method ranks 1st and sets a new record (+6.8% mIoU) on the ScanNet200 test leaderboard. Code will be available at https://github.com/dvlab-research/Imbalanced-Learning.
translated by 谷歌翻译
Weakly-supervised object localization aims to indicate the category as well as the scope of an object in an image given only the image-level labels. Most of the existing works are based on Class Activation Mapping (CAM) and endeavor to enlarge the discriminative area inside the activation map to perceive the whole object, yet ignore the co-occurrence confounder of the object and context (e.g., fish and water), which makes the model inspection hard to distinguish object boundaries. Besides, the use of CAM also brings a dilemma problem that the classification and localization always suffer from a performance gap and can not reach their highest accuracy simultaneously. In this paper, we propose a casual knowledge distillation method, dubbed KD-CI-CAM, to address these two under-explored issues in one go. More specifically, we tackle the co-occurrence context confounder problem via causal intervention (CI), which explores the causalities among image features, contexts, and categories to eliminate the biased object-context entanglement in the class activation maps. Based on the de-biased object feature, we additionally propose a multi-teacher causal distillation framework to balance the absorption of classification knowledge and localization knowledge during model training. Extensive experiments on several benchmarks demonstrate the effectiveness of KD-CI-CAM in learning clear object boundaries from confounding contexts and addressing the dilemma problem between classification and localization performance.
translated by 谷歌翻译
Witnessing the impressive achievements of pre-training techniques on large-scale data in the field of computer vision and natural language processing, we wonder whether this idea could be adapted in a grab-and-go spirit, and mitigate the sample inefficiency problem for visuomotor driving. Given the highly dynamic and variant nature of the input, the visuomotor driving task inherently lacks view and translation invariance, and the visual input contains massive irrelevant information for decision making, resulting in predominant pre-training approaches from general vision less suitable for the autonomous driving task. To this end, we propose PPGeo (Policy Pre-training via Geometric modeling), an intuitive and straightforward fully self-supervised framework curated for the policy pretraining in visuomotor driving. We aim at learning policy representations as a powerful abstraction by modeling 3D geometric scenes on large-scale unlabeled and uncalibrated YouTube driving videos. The proposed PPGeo is performed in two stages to support effective self-supervised training. In the first stage, the geometric modeling framework generates pose and depth predictions simultaneously, with two consecutive frames as input. In the second stage, the visual encoder learns driving policy representation by predicting the future ego-motion and optimizing with the photometric error based on current visual observation only. As such, the pre-trained visual encoder is equipped with rich driving policy related representations and thereby competent for multiple visuomotor driving tasks. Extensive experiments covering a wide span of challenging scenarios have demonstrated the superiority of our proposed approach, where improvements range from 2% to even over 100% with very limited data. Code and models will be available at https://github.com/OpenDriveLab/PPGeo.
translated by 谷歌翻译